From Field Preparation to Phenotype Information
The following steps must be completed prior to planting:
Note: These steps are carried out by Pauli lab members a few weeks before planting.
After completing these steps, the field will look like Figure 1.
Figure 1: South gantry field with shaped raised beds, sprinkler irrigation, and strings and stakes.
Lettuce planting generally occurs around Mid-November to early-December. The mean air temperature during this time has previously ranged from 10 °C to 22 °C (Figure 2).
Figure 2: Mean air temperature data collected by The Arizona Meteorologial Network (AZMET). Orange vertical lines represent the day of planting. DOP, day of planting; S13, season 13; S15, season 15; S17, season 17.
Planting is done by hand using Earthway garden seed planters (Figure 3 Left). Lettuce seeds must be planted at a depth of 1/8 to 1/4 inch. The planting depth can be set using the adjustable screw at the bottom of the seed planter - ensure this is set to an acceptable depth throughout planting as it can shift. Also, make sure that the chain is not tangled at the bottom of the planter, as it is meant to cover the soil after the planter penetrates the soil during planting. If the chain is tangled, seeds will not be covered with soil, and thus, may not germinate or be blown/washed away.
The Earthway planters were modified by fitting them with funnels and tubing that allows the user to manually hand-feed the small lettuce seeds instead of using the provided seed container and plates. Planting is carried out by members of the Pauli, Arnold, and Michelmore labs. People are paired up with one person responsible for planting the seeds with the Earthway planter, and the other responsible for ensuring the correct plot numbers are being planted and that the correct seed is provided to the person planting (Figure 3 Right).
Figure 3: Lettuce hand planting. (Left) Earthway garden seed planter. (Right) One person planting using the Earthway planter, while the other is responsible for ensuring correct plot numbers and handing the correct seed to the person planting.
In past years, the tubing that feeds the seeds into the ground have gotten pinched or otherwise clogged. In these cases, entire columns were inadequately planted - the seed did not make it into the seed line of the expected plot. When this happens, Drs. Duke Pauli and Maria José Truco are notified. The plots within the specific column/s are noted. If seed is not immediately available, Dr. Maria José Truco sends it from Davis, California.
The raw data collected by the Field Scanalyzer has a high level of misalignment of images and point clouds. To mitigate this error, a high number of ground control points (GCPs) are placed in the field. These GCPs include (Figure 4): - White plastic bucket lids, four columns into the field on both east and west ends - Umbrella holders with grey metal bucket lids, trench between four and five columns into the field on both east and west ends
Figure 4: Ground control points (GCPs) used in the gantry field. (Left) White plastic bucket lid. (Right) Umbrella holder with grey metal bucket lid.
Each range contains a single white plastic bucket lid and two umbrella holders with grey metal bucket lids in the following arrangement (Figure 5):
Figure 5: Arrangement of ground control points (GCPs) in the gantry field. (Left) Each range contains a single white plastic bucket lid and two umbrella holders with grey metal bucket lids. (Right) White plastic bucket lids are alternated, to ensure robust geocorrection.
Thinning is a very important part of the field trial. The planters often result in clusters of seeds germinating close to each other. Thinning is conducted in two phases (Figure 6):
Figure 6: Change in plant density after multiple rounds of thinning. (Left) Plants after Phase 1 of thinning. (Right) Plants after Phase 2 of thinning.
The 10 individual plants resulting from Phase 2 should be equidistant. The equidistant placement reduces any overlap with neighboring plants. This is an important step as the goal with the Field Scanalyzer data is to phenotype each plant individually. The farther plants are, the easier it is to individually phenotype them.
The Global Positioning System (GPS) coordinates of each GCP must be collected so they can be used in PhytoOracle workflows. To accomplish this, you need a Trimble Global Navigation Satellite System (GNSS) (Figure 7).
Figure 7: Trimble Global Navigation Satellite System (GNSS) used to collect accurate Global Positioning System (GPS) coordinates of Ground Control Points (GCPs).
The United States Department of Agriculture (USDA) Arid Land Agricultural Research Center (ALARC) has trimbles that we can borrow. To use them, follow the steps below:
Run Trimble Access - Press Trimble hard key (Windows symbol), select Trimble Access
Log in — Click either “Tap here to log in” or the current logged in person (e.g., kelly.thorp)
PhytoOracle relies on geospatial information, such as GPS coordinates, to accurately link phenotypes with a location in the field. This allows us to detect, tag, and track individual plants over the course of multiple Field Scanalyzer scans. Specifically, PhytoOracle requires two files:
These files must be generated prior to data processing for the respective season. Additionally, these files should be loaded onto QGIS for visual inspection and confirmation that the coordinates are accurate.
The Trimble collects GPS coordinates in the Easting, Northing format (Table 1). PhytoOracle requires GPS coordinates to be in the latitude, longitude format. To convert the coordinates, use the gcp_coordinates_conversion repository to use the conversion tool. After running the conversion script, the data will now be in the required latitude, longitude format (Table 2).
| GCP | Type | Northing | Easting | Height..m. |
|---|---|---|---|---|
| plate1 | White | 3659979 | 408992.8 | 360.775 |
| plate2 | White | 3659987 | 408992.9 | 360.788 |
| plate3 | White | 3659995 | 408992.9 | 360.783 |
| plate4 | White | 3660003 | 408993.0 | 360.770 |
| plate5 | White | 3660011 | 408993.1 | 360.775 |
| plate6 | White | 3660019 | 408993.1 | 360.765 |
| GCP number | Latitude | Longitude |
|---|---|---|
| 1 | 33.07470 | -111.975 |
| 2 | 33.07478 | -111.975 |
| 3 | 33.07485 | -111.975 |
| 4 | 33.07492 | -111.975 |
| 5 | 33.07499 | -111.975 |
| 6 | 33.07506 | -111.975 |
GeoJSON files contain polygons that represent each plot in the gantry field (Figure 8). These polygons are used to extract smaller experimental units from larger units, such as the full field scale.
Figure 8: GeoJSON file containing a single polygon for each plot.
Our field design and dimensions remain pretty consistent from one season to the next. As a result, existing GeoJSONs are modified and applied to new seasons. In the case that a new GeoJSON needs to be created, please refer to FIELDimageR.
If you are editing a pre-existing GeoJSON, you will need to:
To move polygons, you need to load the GeoJSON and a drone orthomosaic onto QGIS. Then, you can follow the steps in Figure 9:
Figure 9: Editing GeoJSON polygons using QGIS.
The “genotype” values in the GeoJSON file can be edited using GeoPandas. A GeoJSON can be opened up as a dataframe, similar to Pandas. Once opened, you can then replace the “genotype” columns using the fieldbook for the respective season. To see an example click here.
The PhytoOracle (PO) pipelines require the aforementioned GCP and GeoJSON files. Additionally, a Yet Another Markup Language (YAML) file is used by PO for automated, reproducible data processing. YAMLs are a form of a configuration file that can be used to define a series of arguments/flags. The details of the YAML files can be found on our PhytoOracle Automation repository.
For each season, YAML files must be edited to correctly process data for the respective season.
Specifically, the following keys should be edited for each season:
Examples of YAMLs for each season can be found here.
At the start of a new season, the phytooracle_automation repo. Specifically, the season_config_yaml variable needs to be updated.
The season is defined by multiple keys, including name, start_date, end_date, flir_temp_units, and complete_field_dates (Figure 10).
Below are some details for each key:
Figure 10: Section of the \(season_config_yaml\) variable in the \(phytooracle_data\) GitHub repository.
To get a list of RGB dates, use iRODS to ils the directory of the respective season (Figure 11).
Figure 11: Getting RGB dates using iRODS.
The PhytoOracle 3d_landmark_selection contains the phytooracle_data repository. As such, the 3d_landmark_selection container must be rebuilt once the abovementioned changes have been made to phytooracle_data repository and click on “Trigger” for the “latest” container (Figure 12).
Figure 12: Rebuilding the \(3d_landmark_selection\) container on DockerHub.
The University of Arizona maintains an HPC center, which houses three compute resources: El Gato, Ocelote, and Puma.
| Name | El Gato | Ocelote | Puma |
|---|---|---|---|
| Model | IBM System X iDataPlex dx360 M4 | Lenovo NeXtScale nx360 M5 | Penguin Altus XE2242 |
| Node Count | 131 | 400 | 236 CPU-only |
| 8 GPU | |||
| 2 High-memory | |||
| Total System Memory (TB) | 26TB | 82.6TB | 128TB |
| Processors | 2x Xeon E5-2650v2 8-core (Ivy Bridge) | 2x Xeon E5-2695v3 14-core (Haswell) | 2x AMD EPYC 7642 48-core (Rome) |
| 2x Xeon E5-2695v4 14-core (Broadwell) | |||
| 4x Xeon E7-4850v2 12-core (Ivy Bridge) | |||
| Cores / Node (schedulable) | 16c | 28c (48c - High-memory node) | 94c |
| Total Cores | 2160* | 11528* | 23616* |
| Processor Speed | 2.66GHz | 2.3GHz (2.4GHz - Broadwell CPUs) | 2.4GHz |
| Memory / Node | 256GB - GPU nodes | 192GB (2TB - High-memory node) | 512GB (3TB - High-memory nodes) |
| 64GB - CPU-only nodes | |||
| Accelerators | 46 NVIDIA P100 (16GB) | 29 NVIDIA V100S | |
| /tmp | ~840 GB spinning | ~840 GB spinning | ~1440 TB NVMe |
| /tmp is part of root filesystem | /tmp is part of root filesystem | /tmp | |
| HPL Rmax (TFlop/s) | 46 | 382 | |
| OS | Centos 7 | CentOS 7 | CentOS 7 |
The UArizona HPC provides three types of resources:
*Note: High priority is only available for the Puma cluster.
PhytoOracle is a scalable, modular phenomics data processing workflow manager. In short, this means that PhytoOracle can leverage high performance computer (HPC) clusters and cloud computing to distributed tasks across hundreds to thousands of cores.
Resources are defined in the workload_manager section of the PhytoOracle YAML. In this section, you can define many compute resource settings. Below is an example:
There are a few things you must ensure before deploying PhytoOracle: - Confirm existence and accuracy of GCP file - Visually inspect on QGIS. Confirm correct placement of GCPs by overlaying the points with an RGB orthomosaic, either drone or gantry. - Confirm existence and accuracy of GeoJSON file - Visually inspect on QGIS. Checking plot number sequence and genotype values.
If these steps are not followed, errors can propagate to multiple levels of data processing, requiring a reprocessing of data.
The Field Scanner collects two-dimensional (2D) and three-dimensional (3D) data types, including scannerTop3D (3D), stereoTop (RGB), ps2Top (fluorescence), and flirIrCamera (thermal) (Figure 13).
Figure 13: Data types collected by the Field Scanner. Two-dimensional (2D) data types include RGB, fluorescence, and thermal images, while three-dimensional (3D) include 3D point clouds.
The 2D data collected by the Field Scanner includes stereoTop (RGB), flirIrCamera (thermal), and ps2Top (fluorescence). These data process relatively quickly as they are much lower in size compared to 3-dimensional (3D) data. The processing of 2D data types are fully developed for both lettuce and sorghum (Figure 14).
Figure 14: Visualization of 2D data processing by PhytoOracle.
The major goal of the PhytoOracle project is to phentoype individual plants at a high spatial-temporal scale. To accomplish this, individual plant positioning information (GPS coordinates) collected during 2D data processing are leveraged to extract data from 3D data (Figure 15).
Figure 15: Visualization of 3D data processing by PhytoOracle.
As such, much focus has been placed on 3D point cloud data. These data undergo intensive processing to extract individual plant point clouds (Figure 16).
Figure 16: Individual plant point clouds processed by PhytoOracle.
After (i) checking the GCP and GeoJSON files (Section ??) and (ii) generating a YAML file (Section ??), you are now ready to run PhytoOracle.
PhytoOracle is made up of multiple workflows to process 2-dimensional (2D) and 3-dimensional data (Figure 17). These workflows allow for automated, scalable processing of raw data collected by the Field Scanner. The data processing results in high spatial-temporal phenotype information.
Figure 17: PhytoOracle workflows for processing raw data collected by the Field Scanner.
PhytoOracle is mainly deployed on the UArizona HPC. The next sections provides a brief description of how to run each workflow. For additional details, please refer to the PhytoOracle publication. In all cases, the commands provided will automatically handle all steps of processing, including:
The stereoTop workflow runs image stitching and plant detection, resulting in the extraction of bounding area and GPS coordinate information for each plant. The workflow is run as follows:
sbatch shell_scripts/slurm_submission_large.sh <yaml_file>
For example, if you wanted to run this for season 15:
sbatch shell_scripts/slurm_submission_large.sh
yaml_files/season_15/stereoTop_level01_s15.yamlsbatch shell_scripts/slurm_submission_large.sh <yaml_file>sbatch shell_scripts/slurm_submission_large.sh
yaml_files/season_15/flirIrCamera_level01_s15.yamlsbatch shell_scripts/slurm_submission_large.sh <yaml_file>sbatch shell_scripts/slurm_submission_large.sh
yaml_files/season_15/ps2Top_level01_s15.yamlThe scanner3DTop workflow runs point cloud stitching leverages GPS coordinates collected during stereoTop processing, resulting in the extraction of traditional and topological shape descriptors for each plant. This worlflow involves multiple levels of processing, including:
sbatch shell_scripts/slurm_submission.sh
yaml_files/season_15/scanner3DTop_level01_s15.yamlsbatch shell_scripts/slurm_submission.sh
yaml_files/season_15/scanner3DTop_level02_s15.yaml*Note: Notice that scanner3DTop level 1 and 2 processing uses the shell_scripts/slurm_submission.sh instead of shell_scripts/slurm_submission_large.sh. This is because the manager node performs no processing, it merely provides the tasks and sends them to worker nodes. As such, the manager node only requires two cores instead of 94.
Although PhytoOracle is reproducible due to the use of containers and YAML configuration files, it is important to follow quality control (QC) and quality assurance (QA) steps after data processing. The recommended steps for this are:
If any errors are spotted during these QA/QC steps, immediately notify the project lead. Depending on the impact of the error, data may need to be reprocessed to ensure data integrity.